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f— ( ! Hallin and Ley (2012) investigate and fully characterize the Fisher singularity phe- 

nomenon in univariate and multivariate families of skew- symmetric distributions. This 
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paper proposes a refined analysis of the (univariate) Fisher degeneracy problem, show- 
ing that it can be more or less severe, inducing n 1 / 4 ("simple singularity"), n 1//6 ("dou- 
ble singularity"), or n 1 ' 8 ("triple singularity") consistency rates for the skewness pa- 
rameter. We show, however, that simple singularity (yielding n 1//4 consistency rates), 
if any singularity at all, is the rule, in the sense that double and triple singularities are 



possible for generalized skew-normal families only We also show that higher-order 
singularities, leading to worse-than-n 1//s rates, cannot occur. 
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1. Introduction. 

The skew- symmetric families, originally proposed in Azzalini and Capitanio (2003) 
and Wang et al. (2004), are, in their univariate version, parametric families of proba- 
bility density functions (pdfs) of the form 

f#{x) :=2o-- l f(o- l (x- ii))U(o-- l (x- ii),5), xGl, (1.1) 

where 
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(a) •& = (//, a, 5)', with |i6la location parameter, a G R(j~ a sca ^ e parameter, while 
5 G R plays the role of a skewness parameter; 

(b) / : R — > Rq~, the symmetric kernel, is a symmetric nonvanishing pdf (such that, 
for any z G R, ^ f(—z) = f(z)), and 

(c) II : R x R — > [0, 1] is a skewing function, that is, satisfies 

n(-z, 5) + 110, 6) = 1, 2,5gR, and IT(z,0) = 1/2, zGR, (1.2) 

and, in case (z, 5) h- > 11(2;, 5) admits a derivative of order s at <5 = for all zeR, 

9jn(z,<J)|f=o = 0, zGR and, for s even, Sgn(ar, %= = 0, z G R. (1.3) 

While condition (11.21) is classical, (ll.3p . which involves the derivatives of II, is less 
usual. The main justification for it lies in the analogy with skewing functions of the 
form U(z,S) = U(5z), by far the most common ones. If II is s times continuously 
differentiable, d s z Tl(5z) = 5 s (d s U)(5z) obviously vanishes at 5 = 0. Similarly, the fact 
that n(— y) + H(y) = 1 implies that d s U(5z) cancels at 5 = for even values of s. 
All skewing functions considered in the literature, as well as those appearing in the 
examples developed in this paper and in Hallin and Ley (2012), satisfy ( II. 3p . Further 
comments on the skewing functions of the form H(z, 5) = U(5z) can be found in 
Section 15.51 

The skew-normal family of Azzalini (1985), for which the symmetric kernel / is 
the standard Gaussian pdf <ft and the skewing function H(z, 5) = <§>(5z) for $ the stan- 
dard Gaussian cumulative distribution function (cdf), is the oldest and most popular 
example of such a skew-symmetric family; varying / and II, however, yields a vir- 
tually infinite number of them. Traditional examples include the skew-exponential 
power distributions of Azzalini (1986), the skew-Cauchy distributions of Arnold and 
Beaver (2000), the skew-t densities of Azzalini and Capitanio (2003), or the gener- 
alized skew- normal distributions of Loperfido (2004). We refer to Genton (2004), 
Azzalini (2005) or Ley (2012) for background reading, details and examples. 

Since the pioneering paper by Azzalini (1985), it is well known that the scalar skew- 
normal distribution suffers from a Fisher information singularity problem at 5 = 0. 
More precisely, the Fisher information matrix for the three-parameter density (II. ip 
in the scalar skew-normal case is singular — typically, with rank 2 instead of 3 — in 
the vicinity of symmetry, that is, at 5 = 0. Such a singularity violates the standard 
assumptions for root-n asymptotic inference, and skew-normal distributions there- 
fore are problematic from an inferential point of view; in particular, any nontrivial 
traditional test of the null hypothesis of symmetry, at first sight, seems impossible. 
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That degeneracy problem has been discussed at length in a number of papers, 
among which Azzalini and Capitanio (1999), Pewsey (2000), DiCiccio and Monti (2004), 
Chiogna (2005), Azzalini and Genton (2008) or Ley and Paindaveine (2010); see Hallin 
and Ley (2012) for a detailed account. While all authors were pointing at some spe- 
cial status for normal kernels, hence skew- normal distributions, Hallin and Ley (2012) 
have shown that this information deficiency has no special relation to the skew-normal 
case, but actually originates in an unfortunate mismatch between / and II — more pre- 
cisely, between two densities, the kernel / and an exponential density gu associated 
with the skewing function LT (see Section [2TTT) . 

The deficiency of the Fisher information matrix results in slower consistency rates 
in the estimation of the skewness parameter (at 5 = 0) — equivalently, it yields slower 
local alternative rates (contiguity rates) in tests of the null hypothesis of symme- 
try (5 = 0). That impact of singular Fisher information on consistency/contiguity 
rates has been studied, in a general context, for the particular case of a deficiency of 
order one, by Rotnitzky et al. (2000), who unify and reinforce earlier proposals by, 
e.g., Cox and Hinkley (1974, pp. 117-118) or Lee and Chesher (1986). 

The typical rate, corresponding to a "simple singularity", would be n 1//4 . However, 
it is well-known from e.g. Chiogna (2005) that, for skew-normal distributions, that 
rate (for the estimation of 5 at 5 = 0) drops down to n 1//6 . In order to understand 
and explain this intriguing phenomenon, we pursue and refine, in the present paper, 
the analysis of Fisher singularity initiated in Hallin and Ley (2012). We show that 
this deterioration from n 1//4 to n 1 / 6 is explained by a "double singularity" property (a 
terminology that will become clear in the course of this paper) — the double sin of the 
skew-normal. That n 1//6 rate in turn possibly can drop further down to n 1 ^ 8 , a case 
of "triple singularity" . This, however, as we show in Theorem 14.11 is the worst case: 
"fourfold singularities" — quadruple sins — yielding n 1 / 10 rates or worse, are impossible. 

Our aim is to characterize, in the spirit of Hallin and Ley (2012), among all 
families of univariate skew-symmetric distributions suffering from Fisher singularity, 
those exhibiting that double/triple singularity phenomenon, and to show that there 
exist no higher-order ones. It turns out that only Gaussian kernels can exhibit double 
(hence, also triple) degeneracy. The skew-normal family is one example; other ones 
are found in the class of generalized skew-normal distributions (Loperfido 2004). We 
also provide (in the spirit of Rotnitzky et al. 2000) the reparametrizations and the 
scores taking care of simple, double and triple singularities and achieving the n 1//4 , n 1//6 
and n 1//8 consistency/contiguity rates, respectively. 
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The paper is organized as follows. Section [2] deals with the simple singularity case, 
Section [3] with double singularity. Section H] analyzes the triple singularity case and 
shows that higher-order ones are excluded. Examples for each type of singularity, and 
a discussion of the most standard type of skewing function are provided in Section [5j 

2. The simple singularity case. 

In this section, we first briefly revisit the main result of Hallin and Ley (2012); we 
then show how to remove the singularity problem via an adequate reparametrization 
leading, in general, to n 1 / 4 consistency rates for the skewness parameter in the vicinity 
of symmetry. 

2.1. Simple singularity: a mismatch between f and U. 

Throughout, we consider the skew-symmetric distributions with pdf (II. II) . along 
with regularity assumptions on / and II that will be tightened from section to section. 
The minimal regularity assumptions we need are those of Hallin and Ley (2012). 

Assumption (Al). (i) The symmetric kernel / is a standardized symmetric pdf. (ii) 
The mapping z i— > f(z) is continuously different iable, with derivative /, at all zel. 
(iii) Letting <£>/:= — ///> t ne information quantities u~ 2 Xf for location and o~~ 2 Jj for 
scale, with 

/OO /'OO 
V )(z)f{z)dz and J f := / (z<p f (z) - l) 2 f(z)dz, 
-oo J — oo 

are finite. 

Assumption (A2). (i) The mapping (z,S) i— > U(z,5) is continuously differentiable at 
5 = for all z G IR; (ii) the derivative dgH(z, 5)\s=o ='■ ip(z) admits a primitive \l/; (iii) 
the quantity i[) 2 (z)f(z)dz is finite. 

Regarding Assumption (Al)(i), the term "standardized" means that the scale 
parameter (not necessarily a standard error, so that finite second-order moments are 
not required) of the symmetric kernel equals one — an identification constraint for a 
that does not imply any loss of generality; see Hallin and Ley (2012) for a discussion 
of possible choices of scale parameters. All other assumptions ensure the existence 
and finiteness of Fisher information for the original parametrization. 

Under Assumptions (Al) and (A2), the score vector if-^, at (//, a, 0)' =: #o, takes 



4 



the form 



*/i*o(aO := S^\ogf^(x)\^ =: U}. A) (x), £}.# (x), i 3 f . A) (x) 



( 



a 



l <f f {a \x-fi)) 



\ 



a 



l {a 1 (x-/i)<f f (a 1 (x-fi))-l) 



\ 2ij(a-\x-fi)) ) 

where the factor 2 in £ 3 .^ follows from the fact that 11(2, 0) = 1/2 for all z6l. We 
attract the reader's attention to the fact that the skewing function II plays no role 
in the score functions for /i and a at 5 = 0. The resulting 3x3 Fisher information 
matrix then exists, is finite, and takes the form 



r /;tfo : = a 7 'WsyWaO/fc \x - fJ>))dx 



( V 1 
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, 33 
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with 



and 



a- 2 T 



„,22 -i rr 

7/;tf = O- Jf, 



-2 



^)/(z)(fe, 



7/;0 



2(7 



-1 



V f {z)4){z)f{z)dz. 



The zeroes in are easily obtained by noting that and £ 3 f . 1?o are odd functions 
of (x—fi), whereas i 2 .^ is even with respect to the same quantity. Consequently, Fisher 
singularity only can be caused by the collinearity of £ 1 f.^ and P- 3 .q - Starting from that 
elementary observation, Hallin and Ley (2012) show that the family of densities (11. ip 
characterized by a couple (/, II) suffers from Fisher singularity at 5 = if and only if 
the symmetric kernel / belongs to the exponential family 



:= \ 9a '■= exp(— a$>)/ / exp(— 

J — oo 



z))dz 



a e A 



(2.4) 



with minimal sufficient statistic natural parameter —a, and natural parameter space 



A := < a G K such that / exp(— a$>{z))dz < oo \, 

J — oo 



yielding 



-2 2 

a a 



^ 2 (z)f(z)dz and j} 3 # = 2a~ 1 a / i) 2 (z)f\z)dz. (2.5) 



We refer the reader to the end of Section 2.1 in Hallin and Ley (2012) for comments 
and a discussion on the existence of couples (/, II) such that / G for given / and 
for given II, respectively. 
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2.2. Towards a singularity- free reparametrization: orthogonalization. 

A natural way to handle this singularity problem consists in reparametrizing (11. ip 
in the spirit of Rotnitzky et al. (2000). Assume that / and II are such that / e 8^. 
The collinearity between the score for location and the score for skewness can be taken 
care of by a Gram-Schmidt orthogonalization process applied to the three components 
of if-^ - This process projects, in the L 2 geometry of the information matrix, the score 
for skewness £%.$ onto the subspace orthogonal (at t?o) to the scores for location and 
scale £\.0 Q and £ 2 f.# , so that the score for skewness becomes orthogonal to the score 
for location (since it is already orthogonal to £ 2 f.# )- The resulting score for skewness 
is 

while the other two scores remain unchanged: £^ o = ^}.^ , £fi# = ^/-0 Q - As expected, 
in view of (12. 5p . 

, . , 2a~ 1 a f°° ib 2 (z)f(z)dz 

£%(x) = 2^(a-\x - ai)) - a-^ia-^x - »)) - - J - = 0. (2.6) 



a 



This orthogonal system of scores is associated (at $o) with the reparametriza- 
tion t? (1) := (At« (jW,^ 1 ))', with 

= n + 25a /a, a {1) = a, and 5 {1) = 6, 

hence with 

f£ m (x) :=2(a«)-V((x-Ai (1) +25( 1 V( 1 Va)/ff (1) )n((x- A i (1) +25( 1 )a (1) /a)/a (1) ,5 (1) ); 

it is easily checked, indeed, that d s w f^- {1) (x)\ s=S (i) =0 = £f^ (x)- Note that, under 
5 = 5^ = (but not in a neighborhood thereof) and $ = #o coincide. 

Since this reparametrization, which only affects the location parameter, cancels 
(at := (// (1) ,<T (1) ,0)' = (ji,(t,0)' = O ) the score for skewness, second derivatives 
with respect to 5^ = 5 naturally come into the picture in the Taylor expansion of the 
log-likelihood. To be precise, the score £^ (1) (a;) = £ 3 $ (x) = d$ log f^ m (x)\^m is sup- 
posed to provide a linear term t 3 £^ (x) in the Taylor expansion of log f^m + ^ Q Q y ( x ) 
about log f u w (x). Since that linear term happens to be zero, the best approximation is 

2 

provided by the quadratic term ^-d 2 log f£ m (x) . The quantity \d 2 5 log (x) 

thus plays the role of a score function in that approximation, at — not for 
though, but for (S^) 2 . Note indeed that, in view of (12. 6p . 



E tf (1) 



^log/° w (X)|^, = -E,(i) [(dslogf^X)^ 



0. 
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which is an essential property of score functions. As a result, if the impact, on the 
log-likelihood of an i.i.d. sample of size n, of a perturbation t% of 5 = is to exhibit 
the central-limit magnitude of n~ 1//2 , T3 itself has to be of magnitude n~ 1//4 only; 
moreover, information about its sign is lost (a phenomenon which is also stressed by 
Rotnitzky et al. 2000). This is the structural reason for slower-than-n 1 / 2 consistency 
rates (at ^q 1 '' = $0) for the skewness parameter 5 in the singular case: see the next 
section for details. 



2.3. Towards a singularity-free reparametrization: second- order scores. 

Second-order derivatives thus quite naturally enter the scene in case of degener- 
ate Fisher information. The existence of derivatives of order two, however, requires 
reinforcing the regularity assumptions (Al) and (A2) on / and IT. 

The reinforced regularity assumptions we need to reparametrize (at ^q 1 '' = $0) the 
family (11.11) are as follows — recall that we only address the case under which / and 
II are such that / = g a G for some a G A (see (12.41) ): / thus is now entirely 
determined by II and the constant a, and we only need strengthening (A2). 

Assumption (A2 + ). Same as (A2) but moreover (i) the mapping (z,5) U(z,5) 
is twice continuously differentiable at (-2,0), z G R; (ii) denoting by z 1— > ij)(z) = 
dgd z Il(z, <5) |(5=o the derivative of ip, the quantities ip 2 (z)z 2 f(z)dz and J^ oo (2a~ 1 ip(z) 
2ip 2 (z)) 2 f(z)dz are finite. 

Assumption (A2 + )(i) ensures the existence of the second derivative dgf^ (x) , 
while Assumption (A2 + )(ii) guarantees finiteness of the corresponding covariance ma- 
trix. Assumption (A2 + )(i) also entails dsd z Il(z, S)\s=o = d z dsIl(z,S)\s=o for all zel, 
so that this mixed derivative indeed coincides with ip(z) (see (A2 + )(ii)). As already 
pointed out, Assumption (A2 + ) not only reinforces (A2) but also, via the requirement 
that / = g a G £y for some a G A, entails (Al), which is no longer needed. 

Now, in line with Section 2.1, and under Assumption (A2 + ), let 



■fA 1)[x) 



7;<" 



( <V) log f£ (1) (x) N 

9 *m log f$i)(x) 
\ld 2 sW log ff (1) {x)\ e w J 



[x, 



a 1 aip(a l (x — jj)) 



(2.7) 
\ 



a 



l {a-\x 



[x 



ti) - 1) 

\ 2 J(a-\x- f i))-2^ 2 (a- 1 (x- f i))J 
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with covariance 



r, „(D := a 1 I I.., 



' -oo 



oo 





( 
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\x — ji))dx =: 
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7 M X) 
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7 /;*« 


-y33 


/ 



where (finiteness of the integrals below follows from (A2 + )(ii)) 

/OO /*OG 
-rm ■'' ./—no 



and 



^>= 4 / (a-VW-^W)VW^ 



First, let us assume that Fy^M has full rank. Denoting by X\, . . . ,X n an i.i.d. 
sample of size n from f^i), the vector defined in ( 12. 7ft provides a linear term, 

of the form (ri,r 2 ,r|) ^I=i^f-,»w(^"*)> to ^ e Taylor expansion of the log-likelihood 
Y^=i^ogf% (Xi) with respect to YZ=i log f% ( x i)- In order for that lin- 

ear term to exhibit the required traditional central-limit behavior, the perturba- 
tion r := (ti,T2,T3)' has to be of the order (n _1//2 , n _1//2 , ra _1//4 )', hence must be 
of the formr = (n" 1 ^, n -i/2f 2j yielding (ti,^,^)^ 1 ^^^^ 

which, in view of the fact that £ has expectation zero and finite full-rank 

j > 

variance , is asymptotically normal under , as should be for the linear term 

of local log-likelihood expansions under the assumptions of the classical MLE theory. 

This also naturally suggests a test rejecting the null hypothesis of symmetry (in 
favor of an asymmetry of unspecified sign) whenever the quadratic statistic (of the 
Lagrange Multiplier type; # = (/}, a, 0) stands for a root-n consistent estimator 
of &jp under 5 = 0) 

n 

i=i 

exceeds the chi-square quantile (one degree of freedom) of order (1 — a). For all those 
reasons, the terminology "score vector" adequately can be used for t.^m- 

However, score vectors, in the classical MLE theory as well as in Le Cam's theory 
of locally asymptotically normal experiments, enjoy stronger properties, ensuring, in 
particular, the optimal nature of the test just described. Those properties rely on the 
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quadratic approximation (as n — > oo, under i?q ) °f local log-likelihood ratios which, 
in the present case, should take the form 



^2 l0g ■^f 1 )+(n- 1 /2 tl)n -i/2 f2)n -i/4 t3 )/ {Xi) 
i=l 

n n \ 

X>g/^)«+(*i^2,*i)n- 1/2 ^ 



i=i i=i 
where Fy^ci) is the covariance matrix of ^.^(l) ■ This quadratic approximation does not 
hold here without additional assumptions on higher-order log-likelihood derivatives 
of orders three and four. This point is investigated in detail in Hallin, Ley and 
Monti (2012), for the particular case of the skew-normal, and we will not pursue it 
any further here. 

We have assumed, so far, that T . .m has full rank. In most cases, the com- 
ponents of the new score vector (^^(i))^^(i))^ 3 .^(i))' are n °t collinear anymore, so 
that r ,(i) indeed is non-singular; our objective of a singularity-free parametrization 
then is achieved, with consistency rate (for 5, at $o) ^ 1//4 = (n 1 ^ 2 ) 1 ^ 2 . But this is not 
a general rule: in the case of the skew-normal family, for instance, Chiogna (2005) 
showed that the correct rate is only n 1 ^ 6 . The explanation, as we shall see, lies in 
a double singularity phenomenon, which occurs when ^^(i) an d ^.^u) i n turn are 
collinear (by construction, the location score ^.^(l) is orthogonal to the other two). 

3. The double singularity case. 

3.1. Double singularity: a special role for Gaussian kernels. 

The double singularity phenomenon thus takes place if and only if 

b(azip(z) - l)/a = {2/a)ip(z) - 2^ 2 (z) a.e. 

(a.e. here and in the sequel means Lebesgue-a.e.) for some constant b G R and a 
couple (/, n) such that / 6 (see (12.4j) ). Rewriting this equation under the form 

ijj(z) = - — + —zip(z) + atp 2 (z) a.e. (3.8) 
2a 2a 

yields a classical Ricatti equation, whose solutions are of the form 

. , — ab 

= ^ (3-9) 



or 



-ab ( a 2 bz 2 \ ( f z ( a 2 bv 2 
i/)(z) = —— z + exp — / [c-a / exp -f- ) dy ) 6,c6l. (3.10) 



2a \ 4cr / V /„ V 4cr 
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First, note that b has to be negative, as otherwise <Pf(z) = aip(z) would tend to — oo 
irrespective of the sign of a when z — > oo, implying positive values of / in the right 
tail of /, which is of course impossible for a density function. Furthermore, since 
both z H- a J* exp (^— a ^ j dy and ip are odd, the constant c in (13.101) has to be zero. 
By ( 12.41) . the natural parameter space A for the exponential family £^ associated with 
the mapping ip of (13.101) then consists of the set of values of a for which the integral 



exp(— a^f(z))dz 



f^b 
V 4cr 



exp I - — z 2 + log 



exp 



a 2 by' 



4a 



(a 2 b 2 

6XP 



a 2 b , 
~4^ 



dy 



dy 



dz 



dz 



is finite. After a change of variable involving the quantity yj o?\b\j {Aa\ this appears 
to be equivalent to the requirement 



exp(— z 



exp(y 2 )dy 



dz < oo. 



(3.11) 



However, one easily can check that lim^oo zexp(— z 2 ) |J" Z exp(y 2 )dy\ = 1/2, meaning 
that exp(— z 2 ) | f Q 2 exp(y 2 )dy\ behaves as 1/z for large values of z. It follows that 
(13.111) is impossible. Hence, the natural parameter space A is empty, meaning that 
no symmetric kernel / associated to the mapping %p of ( 13.101) can yield singular Fisher 
information. Therefore, the only admissible solution to (1 3 . 8 [) is ( 13. 9p . 

This finding is quite remarkable: combined with the fact that / G £q, (which is 
equivalent to iff = aip), it implies that double singularity only can occur for symmetric 
kernels / such that (pf(z) = c±z for some constant c± — namely, for Gaussian kernels; 
those Gaussian kernels moreover should be combined with a skewing function n such 
that tp(z) = c 2 z for some constant c 2 . 

While Fisher singularity arises as a mismatch between the symmetric kernel and 
the skewing function, and hence can occur with all possible symmetric kernels, the 
double singularity phenomenon is specific to the Gaussian kernel, hence to a well- 
determined subclass of generalized skew-normal distributions (in the sense of Loper- 
fido 2004). This also implies that, under the assumptions made, n 1 / 4 consistency rates 
are achieved for all other skew-symmetric families subject to Fisher singularity. 

We formalize that result in the following theorem. 



Theorem 3.1. Consider the skew- symmetric family defined in Then, 

(i) under Assumptions (Al) and (A2), the couple (f, U) leads to a skew- symmetric 
family subject to Fisher singularity at 5 = if and only if the symmetric kernel f 
is related to the skewing function II via the fact that f £ see pO); 
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(ii) under Assumption (A2*~), the couple (/, II) leads to a skew- symmetric family 
subject to the double singularity phenomenon if and only if the symmetric ker- 
nel f is the normal kernel and the skewing function II moreover satisfies 
ij}{z) := dgU.(z, S)\s=o = cz for some real constant c; the family then is a partic- 
ular case of the generalized skew-normal family (Loperfido 2004). 

This theorem completely characterizes the double singularity problem, hence comple- 
ments the simple singularity characterization of Hallin and Ley (2012). 

3.2. A singularity-free reparametrization. 

Still inspired by Rotnitzky et al. (2000), let us now proceed with this second 
singularity the way we did with the first one, producing a second, hopefully singularity- 
free, reparametrization. Since the symmetric kernel <j) is the only candidate for this 
double singularity phenomenon, we can limit ourselves to / = <fi. Moreover, we 
know from the previous section that ip(z) = C2Z; hence, in view of the fact that z = 
tficj)(z) = aip(z), we have c 2 = I /a. Applying the same Gram-Schmidt process as in 
Section [2T2l but with the score for scale £ 2 m substituted for the score for location, 



we project ( A (1) onto the subspace orthogonal to i 1 (1) and £ 2 The resulting 
residual score for skewness then, as expected, is zero: 

2 2, , 2 , ,2a- 1 f°° (z 2 - l)(a~ 2 -a- 2 z 2 U(z)dz 

-~ ~ ((* - »)/°) - ^({{x - n)/a) 2 - 1) ■ 



"OG 
-OO 

0. 



a 2 a 2VV ^" ' uv ^" ' ' a~ 2 f™{z 2 - l) 2 cj)(z)dz 



Transposing, as in Section 12.21 this projection in terms of parameters leads to the 
reparametrization := {^ 2 \ a^ 2 \ 5^)', where 

Cov(£ 2 {1) ,£ 3 m ) 

^ = ^ = ai + 25a/a, a^ = + 5 2 = - 25 2 /a% 

Var (^.. a( i)) 



and 



<S< 2 > = 5® = 5. 



In line with previous notations, we denote by /5 2) the resulting skew-symmetric den- 
sity despite the fact that the symmetric kernel is 0. It is easy to check that our 
reparametrization, in the skew-normal case, coincides with that of Chiogna (2005). 
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This second reparametrization solely affects the scale parameter, but again cancels 
the score for skewness. Thus, derivatives of order three with respect to 5^ = 5 come 
into the picture, which eventually will lead to n 1 ^ 6 consistency rates. This, however, 
requires a reinforcement of Assumption (A2 + ). 

Assumption (A2 ++ ). Same as (A2 + ), but now (i) the mapping (z,5) U(z,6) 
is three times continuously differentiable at (z,0) for all z £ R; (ii) letting T(z) := 
d$n(z,6)\ s=0 , {itz 3 - ^z+\T{z)) 2 j>{z)dz is finite. 

Assumption (A2 ++ )(i) ensures the existence of the third-order derivative 9f fB 2 ) 

(/i, a, 0)' = t?o, while Assumption (A2 ++ )(ii) guarantees 
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finiteness of the corresponding covariance matrix. Also note that the mixed derivative 



by definition of skewing functions, and that d 2 d$Il(z,5)\ 



1 5=0 



d z d 2 U(z,8)\ s=0 

d 2 ip(z) vanishes for all z, since we are dealing (Theorem 3.1(h)) with skewing functions 
such that ip(z) = z/a is linear. These facts greatly simplify calculations. 

Assumption (A2 ++ ) thus implies, for this second reparametrization, the existence, 
at $ , of a third-order score vector with finite covariance matrix r ,^(2) , enjoying 
the same properties as the second-order score described in Section I2.3[ now with 
rates n 1 ^ 6 . Elementary algebra yields 
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and 

If we assume, as is Section I2.3[ that Y .(2) has full rank, denoting by X\, . . . , X n 
an i.i.d. sample of size n from /^ 2 )> the score vector provides a linear term to the 
Taylor expansion of the log-likelihood, as well as a Lagrange multiplier-type test of 
the null hypothesis of symmetry (in the generalized skew- normal family under study), 
based on the quadratic test statistic 




where $ is, under the null hypothesis of symmetry, a root-n consistent estimator 
of location and scale. The consistency/contiguity rate for 5 (still, at 5 = 0) is n 1 ^ 6 , 
and the same comments as in Section 12.31 are in order. The particular case of the 
skew- normal family is studied in full detail in Hallin, Ley and Monti (2012). 



4. Higher-order singularities. 

It may happen, however, that in turn is singular, the new third-order score 

for skewness £ 3 m being (at a linear combination of the scores for location i 1 (2) 

Wo Wo 

and scale £ 2 ^ ^ (2) . If this occurs, one has to go yet one step further with the approxima- 
tion of log-likelihoods, assuming the existence of fourth-order derivatives and ending 
up with n 1 / 8 consistency/contiguity rates. That n 1 ^ 8 rate, however, as we shall see, is 
the worst possible one. Since this last derivation is not the main aim of this paper, 
we will voluntarily alleviate the reading and spare the reader computational details 
and the diverse steps which we have sufficiently described in the previous cases. 

In order for £ 3 (2) = J^z 3 — %z+ hT(z) to be a linear combination of i 1 f21 — z/a 

Wo 3a a 3 Wo 

and £ 2 m = (z 2 — 1)1 a, T(^) necessarily has be of the form a.A — 1 + z 2 ) + a 2 z + a 3 z 3 , 
W 

with ai, ct2 6l and = — in order to annihilate the term in z 3 . This condition on 
the third derivative w.r.t. 5 thus characterizes what we would call a triple singularity 
(the result is formally stated in Theorem 14.11 at the end of this section). It is quite 
easy to construct examples suffering from this peculiarity; see Section 15.41 



13 



At this stage, the by now familiar machinery new singularity — Gram- Schmidt or- 
thogonalization of scores — reparametrization — new higher-order score for 5 applies, 
leading after some direct manipulations to the reparametrization $^ := (^ 3 \ a^ 3 \ 5^)', 
with 
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Since this reparametrization annihilates the third-order score for skewness, we need to 
take fourth-order derivatives, which requires the following strengthening of Assump- 
tion (A2++). 

Assumption (A2 +++ ). Same as (A2 ++ ), but now the mapping (z,S) h> U(z, S) is 
four times continuously differentiable at (-2,0), z6l 

Let us remark that, as will be seen below, we do not need to assume finiteness 
of Fisher information for skewness, as this will always be the case after this third 
reparametrization. Clearly, as in all previous cases, both the location score 
and the scale score £ 2 ^ (3) remain the same as in the original parametrization, and the 
new fourth-order score for skewness, for skewing functions such that d 3 Il(z,8)\s=o = 
aii(— 1 + z 2 ) + a 2 z — -^z 3 , becomes (after very lengthy but elementary calculations) 
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One again easily can check that this quantity is centered under $q 3 ^ = i? = (a 4 ? o, 0)'. 
The interesting feature here is that the term ( £ ^ i£ ) 4 can by no means be annihi- 
lated, and hence hampers any linear combination with the location and scale scores. 
Thus, the resulting Fisher information matrix (whose finiteness is obvious) 
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cannot be singular, which in turn implies that n 1 ^ 8 rates of convergence are the worst 
possible! The structural reason behind this result lies in the fact that, by definition 
of skewing functions, dgH(z,5)\s=o equals zero, hence cannot interfere in the fourth 
derivative, contrarily to dgU(z,5)\s=o which plays the crucial role in annihilating the 
third-order derivative. 

Those results are summarized in the following theorem, which complements The- 
orem [XU 

Theorem 4.1. Consider the skew-symmetric family defined in U.l\) . Then, 

(i) under Assumption (A2^ + ), the couple (/, II) leads to a skew-symmetric fam- 
ily subject to the third singularity phenomenon if and only if the symmetric 
kernel f is the normal kernel <fi and the skewing function II moreover satisfies 
ip{z) := dgH(z, 5)1,5=0 = z/a for some non-zero real constant a and T(z) : = 
dfiU(z, 5) 1 5=0 = 1 + z 2 ) + ct2Z — -^z 3 for some real constants ct\ and a 2 , 
both possibly zero. 

(ii) under Assumption (A2^ ++ ), the couple (/, IT) leads to no skew- symmetric family 
subject to a fourfold/ quadruple singularity phenomenon. 

We conclude this section by noting that, in most cases (including all classical 
skewing functions described in Section [531 hereafter) . T is an odd function, implying 
some simplifications in the above expressions (namely a± then equals 0), but clearly 
the final outcome does not alter. 

5. Examples. 

In this section, we illustrate our findings on basis of some well-known examples 
of the literature. Our presentation goes crescendo: starting, for the sake of com- 
pleteness, with singularity-free families, we consider simple, double, and finally triple 
singularities. 

5. 1 . Singularity-free families. 

Famous singularity-free examples comprise, inter alia, the skew-exponential power 
distributions of Azzalini (1986) with pdf 2 c' 1 e^>{-\z\ a /a)^{5sign(z)\z\ a/2 {2/af/ 2 ) 
for a > 1 and c = 2a 1 / a ~ 1 T(l/a), and the skew-t distributions of Azzalini and 
Capitanio (2003) with pdf 2t v {z)T u+l (5z{v + lf/ 2 (z 2 + v)- l/2 ) where t v and T v re- 
spectively stand for the pdf and cdf of a standard Student distribution with 77 degrees 



15 



of freedom. These examples are discussed at length in Hallin and Ley (2012), where 
we refer to for details. In that same paper, an example of skewing function for 
which no mismatching symmetric kernel exists is given, namely H(z,5) = H(Ssm(z)) 
with n : R — > [0, 1] a differentiable function satisfying U(—y) + H(y) = 1 for all y G M. 
and such that 11(0) = dU (y)/dy \ y= o exists and differs from zero. 

5.2. Simple singularities. 

As shown in Hallin and Ley (2012), the easiest-to-construct mismatching skewing 
function for a given symmetric kernel / is of the form H(5tpf(z)), with II as described 
above. For any symmetric kernel /, it is readily seen that the location and skewness 
scores then are collinear. 

Under the assumptions made, double singularity requires the additional assump- 
tion that 11(0) := d 2 U(y) / (dy) 2 \ y= o exists and, by construction, equals zero. The- 
orem [XU then tells us that among the pdfs 2f(z)U(5(pf(z)) only the skew-normal, 
obtained for f = <f), suffers from the double singularity. Thus all non-Gaussian kernels 
/ yield examples of simple singularities. 

5.3. Double singularities. 

Concerning the double singularity, a prominent example is of course Azzalini's 
skew-normal family, with pdf 2<j)(z)Q(5z). Let us briefly show that higher-order sin- 
gularities are excluded in that family. Straightforward calculations yield a = y/2n and 
T(z) = — (2ir)~ 1 / 2 z 3 , which is different from — ^ = — (2/7r) 3 / 2 , hence Theorem 14.11 
readily yields the well-known result of n 1 / 6 rates of convergence for the skew-normal 
distribution. For the sake of completeness, we also provide for this famous example 
the corresponding score for skewness, which equals n 4 ~Z- z 3 4=z. 

1 37TV Zir iry 2-k 

Nadarajah and Kotz (2003) propose another family of skew densities generated 
by the normal kernel, with pdfs of the form 2<j)(z)G(5z) where G is some univariate 
symmetric cdf. They call skew normal-G the resulting families of densities. Their def- 
inition includes as particular cases the skew normal-normal model, the skew normal-t, 
the skew normal-Cauchy, the skew normal-Laplace, the skew normal-logistic and the 
skew normal-uniform families. Theorem 13.11 tells us that all skew normal-G models 
suffer from the double singularity, a fact that, except of course for the skew normal- 
normal (which, up to an additional scale parameter, coincides with the classical skew- 
normal), has never been noticed. Consequently, these models have to be treated with 
much care when used for inferential purposes. The problem with those families ob- 
viously stems from the product 5z inside G; see Section 15.51 for further discussion of 
such skewing functions. 
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5.4- Higher- order singularities. 

Let us further analyze the families of Nadarajah and Kotz (2003). Assume that G is 
three times continuously differentiable. Elementary calculations show that a = l/g(0), 
where g(z) := dG(z)/dz, and T(z) = g(0)z 3 . We know from Theorem 14.11 that a 
triple singularity can only occur if g(0) = — = — 8(g(0)) 3 . Among the distributions 
considered by Nadarajah and Kotz (2003), this equality holds for the skew normal- 
logistic only, for which g(0) = 1/4 and g(0) = —1/8. Thus, while all their other 
skew normal-G distributions have n 1//6 rates of convergence, the skew normal-logistic 
requires the worst possible rates, namely n 1//§ rates. 

Finally, consider the "lifted" skew-normal distribution with pdf 

2(f)(z)<$>(5z - (4 - Tr)^)- 1 ^ 3 ). (5.12) 

Here, a = \/2n and T(z) = — (2/ir) 3 / 2 z 3 = — ry=^ z 3 — — ^z 3 , entailing, by The- 
orem I4.1[ a triple singularity and hence n 1 ^ 8 rates of convergence. Note that this 
distribution is part of the so-called flexible generalized skew-normal distributions de- 
fined in Ma and Genton (2004). More generally, in that paper, the authors have 
proposed flexible skew- symmetric distributions with skewing functions of the form 
U(z,5) := U(He(5z)), with II as defined in Section [5TT1 and Hi an odd polynomial of 
order i (meaning that the polynomial only contains odd terms). Since, in the first 
four derivatives, all terms of the form (5z) s with odd s > 5 do not play any role, 
one can directly construct an infinity of flexible generalized skew-normal distributions 
suffering from triple singularity: take, for instance, an odd polynomial Hi with the 
terms in Sz and (5z) 3 as in (I5.12p . such as 

t 

2<p(z)<5>(5z - (4 - 7r)(67r)- W + Y^a 2i+1 (5z) 2l+1 ) 

i=2 

with on e M and 2 < i E N. 

5.5. A brief discussion of skewing functions of the form 11(2;, 5) = 11(5 z). 

As announced in the Introduction, we conclude this paper with a few comments 
on the most frequent type of skewing function, namely 11(2;, 5) = U(5z) with II : R — > 
[0, 1] satisfying II(— y) + H{y) = 1 for all y G R (and satisfying the required differ- 
entiability conditions). Such functions are the most natural examples of a skewing 
function such that ip(z) is linear, yielding an extremely risky combination with the 
Gaussian kernel 0. 
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The original skew-normal family of Azzalini (1985) is based on IT = $; in conjunc- 
tion with a Gaussian kernel, the same type of skewing function has been used, inter 
alia, by 

- Azzalini and Capitanio (1999) for their skew-symmetric densities of the form 
2f(z)F(8z), with F the cdf corresponding to /; 

- Gupta et al. (2002) for their skew-uniform, skew-t, skew-Cauchy, skew-Laplace 
and skew-logistic distributions, which all are special cases of Azzalini and Cap- 
itanio (1999)'s construction; 

- Nadarajah and Kotz (2003) for their skew normal-G distributions, as described 
in the previous sections; and by 

- Gomez et al. (2007) for their skew g-normal densities 2g(z)<&(8z) where, contrary 
to the skew normal-G distributions, normality is present in the skewing function 
and not in the symmetric kernel. 

As shown in this paper, skewing functions of the form 11(8 z) are harmless whenever the 
symmetric kernel is not Gaussian. In view of this, the skew ^-normal distributions (free 
of any singularity except for g = <p) are inferentially preferable to the skew normal-G 
ones (which at least exhibit double singularity). The peculiarities of the skew-normal 
distribution, which belongs to all of the above-cited classes of distributions, have been 
discussed in length in the literature; we hope that this paper sheds some light on the 
structural reasons behind these inferential drawbacks, and warns the reader about the 
dangers of combining a Gaussian kernel with a skewing function of the form U(8z). 

Azzalini and Capitanio (2003) clearly were aware of the dangers of using H(z, 8) 
of the form U(8z): in reaction to a referee's remark, they write A reviewer of this 
paper has remarked that, if we set al = 1, density (26) does not reduce to the form 
2ti(y;v)Ti(ay;v), which seems to be the "most natural" univariate form of skew t 
density generated by Lemma 1 of Azzalini (1985), explain why the skewing functions 
they are proposing for their skew-t densities are not of that type, and suggest that the 
choice of <&(8z) for the original skew- normal perhaps was not the best one. Our results 
amply justify their concern, and confirm the clear-sightedness of their diagnosis. 

Acknowledgements 

Marc Hallin is also member of the Academie Royale de Belgique and ECORE, and 
an extra-muros Fellow of CentER, Tilburg University. His research is supported by 
the Sonderforschungsbereich "Statistical modeling of nonlinear dynamic processes" 
(SFB 823) of the German Research Foundation (Deutsche Forschungsgemeinschaft) 
and a Discovery Grant of the Australian Research Council. 



18 



Christophe Ley thanks the Fonds National de la Recherche Scientifique, Communaute 
frangaise de Belgique, for support via a Mandat de Charge de Recherche FNRS. 

References 

[1] Arnold, B. C. and Beaver, R. J. (2000) The skew-Cauchy distribution. Statist. 
Probab. Lett, 49, 285-290. 

[2] Azzalini, A. (1985) A class of distributions which includes the normal ones. Scand. 
J. Statist, 12, 171-178. 

[3] Azzalini, A. (1986) Further results on a class of distributions which includes the 
normal ones. Statistica, 46, 199-208. 

[4] Azzalini, A. (2005) The skew-normal distribution and related multivariate fami- 
lies (with discussion). Scand. J. Statist., 32, 159-188. 

[5] Azzalini, A. and Capitanio, A. (1999) Statistical applications of the multivariate 
skew- normal distributions. J. R. Stat. Soc. B, 61, 579-602. 

[6] Azzalini, A. and Capitanio, A. (2003) Distributions generated by perturbation of 
symmetry with emphasis on a multivariate skew-t distribution. J. R. Stat. Soc. 
B, 65, 367-389. 

[7] Azzalini, A. and Genton, M. G. (2008) Robust likelihood methods based on the 
skew-t and related distributions. International Statistical Review, 76, 106-129. 

[8] Chiogna, M. (2005) A note on the asymptotic distribution of the maximum likeli- 
hood estimator for the scalar skew- normal distribution. Stat. Methods Appi, 14, 
331-341. 

[9] Cox, D. R. and Hinkley, D. V. (1974) Theoretical statistics. London: Chapman 
& Hall. 

[10] DiCiccio, T. J. and Monti, A. C. (2004) Inferential aspects of the skew- 
exponential power distribution. J. Amer. Statist. Assoc., 99, 439-450. 

[11] Genton, M. G. (2004) Skew- elliptical Distributions and their Applications: a 
Journey beyond Normality. Boca Raton, FL: Chapman and Hall/CRC. 



19 



[12] Gomez, H. W., Venegas, O. and Bolfarine, H. (2007) Skew-symmetric distribu- 
tions generated by the distribution function of the normal distribution. Environ- 
metrics, 18, 395-407. 

[13] Gupta, A. K., Chang, F. C. and Huang, W. J. (2002) Some skew-symmetric 
models. Random Oper. Stochastic Equations, 10, 133-140. 

[14] Hallin, M. and Ley, C. (2012) Skew-symmetric distributions and Fisher informa- 
tion - a tale of two densities. Bernoulli, 18, 747-763. 

[15] Hallin, M., Ley, C. and Monti, A. C. (2012) Le Cam optimal tests for normality 
against skew-normal alternatives. Work in progress. 

[16] Lee, L. E. and Chesher, A. (1986) Specification testing when score test statistics 
are identically zero. Journal of Econometrics, 31, 121-149. 

[17] Ley, C. (2012) Skew distributions. In A. El-Shaarawi and W. Piegorsch, Editors, 
Statistical Theory and Methods, Encyclopedia of Environmetrics, 2nd edition, 
Wiley, New York, to appear. 

[18] Ley, C. and Paindaveine, D. (2010) On the Singularity of Multivariate Skew- 
Symmetric Models. J. Multivariate Anal, 101, 1434-1444. 

[19] Loperfido, N. (2004) Generalized skew-normal distributions. In M. G. Genton, 
ed., Skew- elliptical Distributions and their Applications: a Journey beyond Nor- 
mality, Boca Raton, FL: Chapman and Hall/CRC, 65-80. 

[20] Ma, Y. and Genton, M. G. (2004) Flexible class of skew-symmetric distributions. 
Scand. J. Statist., 31, 459-468. 

[21] Nadarajah, S. and Kotz, S. (2003) Skewed distributions generated by the normal 
kernel. Statist. Prob. Lett., 65, 269-277. 

[22] Pewsey, A. (2000) Problems of inference for Azzalini's skew-normal distribution. 
J. Appl. Statist, 27, 859-870. 

[23] Rotnitzky, A., Cox, D. R., Bottai, M. and Robins, J. (2000) Likelihood-based 
inference with singular information matrix. Bernoulli, 6, 243-284. 

[24] Wang, J., Boyer, J. and Genton, M. G. (2004) A skew-symmetric representation 
of multivariate distribution. Statist. Sinica, 14, 1259-1270. 



20 



